Dependency Structure Trees in Syntax Based Machine Translation
نویسنده
چکیده
Machine Translation (MT) has long been an unsolved problem and more so an interesting and engaging problem. MT deals with the translation of a sentence in a source language into a sentence in the target language preserving the meaning in its full detail. This requires the computer to encode the knowledge of both languages in a representation that can be used at runtime to translate a give input. In earlier years translation problem was approached in three main ways. One, memorizing all the source and target sentences ahead of time called Translation Memory (TM) and reproducing the translation at runtime by a simple lookup process (Hutchins and Somers 1992). Second, use complete source language knowledge for analysis and generate the translation according to the syntax of the target language (Nirenburg et al. 1992). Third, a more semantically motivated and simultaneous multiple language oriented approach that projects the task of translation into a common space, with a uniform representation of knowledge called Interlingua (Hutchins and Somers 1992). Although these approaches have been experimented in great detail in the last two to three decades, a more promising approach called Statistical Machine Translation (SMT) with firm support from statistical and mathematical grounds has taken prominence in the last decade (Brown et al. 1993) (Koehn, Och, and Marcu 2003). Statistical Machine Translation (SMT) approaches use massive amounts of corpus to learn translation models at sub-sentential level which can generalize well to unseen data, unlike the TM. SMT addresses the problem of translation as a noisy channel paradigm where the channel model is usually called the ’Translation Model’ and the sourcemodel is called the ’LanguageModel’. The translationmodel is estimated under a generative story for word correspondences. The languagemodel estimated as ngram sequence models with markov assumptions. This formulation of the translation problem is language agnostic and does not assume any kind of syntax information from either the source or the target. Translation output produced under the SMT framework tend to usually be fragmented and context insensitive. With rigorous estimation techniques and heuristics for incorporation of context (Koehn, Och, andMarcu 2003), we have seen improved performance over the past few years. But, the quality is still far from human consistent. This is primarily to do with the fact that these models are learnt under no syntax scenarios and so are ill-informed about the phenomena of divergences that occur across languages. A recent body of approaches have looked into the incorporation of syntax at various phases of translation process with reasonable success (Yamada and Knight 2001; Chiang 2005). With more researchers looking into intelligent ways
منابع مشابه
Better Learning and Decoding for Syntax Based SMT Using PSDIG
As an approach to syntax based statistical machine translation (SMT), Probabilistic Synchronous Dependency Insertion Grammars (PSDIG), introduced in (Ding and Palmer, 2005), are a version of synchronous grammars defined on dependency trees. In this paper we discuss better learning and decoding algorithms for a PSDIG MT system. We introduce two new grammar learners: (1) an exhaustive learner com...
متن کاملMachine Translation Using Probabilistic Synchronous Dependency Insertion Grammars
Syntax-based statistical machine translation (MT) aims at applying statistical models to structured data. In this paper, we present a syntax-based statistical machine translation system based on a probabilistic synchronous dependency insertion grammar. Synchronous dependency insertion grammars are a version of synchronous grammars defined on dependency trees. We first introduce our approach to ...
متن کاملEBMT for SMT: A New EBMT-SMT Hybrid
We propose a new framework for the hybridisation of Example-Based and Statistical Machine Translation (EBMT and SMT) systems. We add new functionality to Moses to allow it to work effectively with an EBMT system. Within this framework, we investigate the use of two types of EBMT system. The first uses string-based matching, and we investigate several variations, but find that the hybrid system ...
متن کاملImproved Neural Machine Translation with Source Syntax
Neural Machine Translation (NMT) based on the encoder-decoder architecture has recently achieved the state-of-the-art performance. Researchers have proven that extending word level attention to phrase level attention by incorporating source-side phrase structure can enhance the attention model and achieve promising improvement. However, word dependencies that can be crucial to correctly underst...
متن کاملDependency Tree Abstraction for Long-Distance Reordering in Statistical Machine Translation
Word reordering is a crucial technique in statistical machine translation in which syntactic information plays an important role. Synchronous context-free grammar has typically been used for this purpose with various modifications for adding flexibilities to its synchronized tree generation. We permit further flexibilities in the synchronous context-free grammar in order to translate between la...
متن کاملPhrase Dependency Machine Translation with Quasi-Synchronous Tree-to-Tree Features
Recent research has shown clear improvement in translation quality by exploiting linguistic syntax for either the source or target language. However, when using syntax for both languages (“tree-to-tree” translation), there is evidence that syntactic divergence can hamper the extraction of useful rules (Ding and Palmer 2005). Smith and Eisner (2006) introduced quasi-synchronous grammar, a formal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008